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Abstract 

We describe an approximate rational arithmetic with round-off errors (both 
absolute and relative) controlled by the user. The rounding procedure is based 
on the continued fraction expansion of real numbers. Results of computer ex- 
periments are given in order to compare efficiency and accuracy of different 
types of approximate arithmetics and rounding procedures. 

Keywords: approximate rational arithmetic, continued fractions, round-off 
errors, multiple and arbitrary precision computations. 

1. Introduction. Problems of validity and reliability of calculations (in- 
cluding the analysis of round-off errors) are becoming more and more impor- 
tant recently, partly due to the steady growth of computer power. Roughly 
speaking, the main disadvantage of the standard floating point arithmetic is, 
that relative round-off error only can be controlled during calculations. In 
some cases (for example, in summation of series and subtraction of nearly 
equal numbers) this disadvantage can lead to a loss of accuracy and even to 
absolutely incorrect results. So, if the result of calculations depends on the 
errors in input data and round-off errors critically (for example, in the case 
of solving ill-posed equations, study of stability of solutions etc.), then it is 
reasonable to use calculations with multiple and even arbitrary precision. 

An appealing way to improve the accuracy of calculations is to use differ- 
ent versions of rational arithmetics, which work with rational numbers of the 
form 2 , where p and q are integer numbers (q > 0). It is possible to use the 
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exact rational arithmetic (see, e.g. but, as a rule, it leads to an explo- 
sive growth of both calculation time and storage space since magnitudes (and 
lengths) of numerators and denominators of computed numbers grow very fast. 
The approximate rational arithmetics with fixed slash (maximum of lengths 
of the numerator and the denominator is fixed) or floating slash (sum of these 
lengths is fixed) were investigated in detail earlier (see @, |3|). These round- 
ing procedures use the representation of continued fractions. There also exist 
rougher rounding procedures, which use only a fixed number of top digits (the 
other digits are replaced by zeros) but it sometimes leads to relatively large 
rounding errors. 

Here we suggest a new modification of approximate rational arithmetic 
with a more natural and accurate rounding procedure. The user defines values 
A and 5 of absolute and relative error such that 0<A<oo,0<<5<oo. In 
particular, in the case A = 5 = we obtain the exact rational arithmetic. If 
5 = oo, then only absolute rounding error A is fixed. If only relative error 5 is 
fixed, then we obtain approximately the same picture as for the floating-point 
arithmetic. This rounding procedure is applied to a fraction if lengths of its 
numerator and denominator exceed a number M specified by the user. In this 
case the initial rational number is replaced by its best approximation in the 
form of a convergent of a continued fraction within given errors A and S. The 
result of rounding is always an uncancellable fraction, and sometimes it can 
coincide with the initial number. 

This type of arithmetic was originally implemented by means of the RE- 
DUCE computer algebra system and was used for constructing arbitrary ratio- 
nal approximations to functions of one variable [[| . In this paper the analysis of 
accuracy and efficiency of different modifications of approximate rational arith- 
metics is based on computer experiments, which are implemented by means 
of the C ++ language (in an object oriented form within the framework of the 
project outlined in ||). Note that Yu. V. Matijasevich suggested to apply 
an approximate rational arithmetic of arbitrary precision for his a posteriori 
interval analysis || 

Below we describe an algorithm of rounding and construction of best ap- 
proximations (using the convergents of continued fractions). This method is 
compared with other methods by means of a computer experiment. We give 
also an estimation of the number of components of continued fractions in de- 
pendence on the accuracy of rounding. In particular, we show that increase 
in accuracy of calculations does not lead to an explosive growth of calcula- 
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tion time. The main result of computer experiments is that this algorithm of 
rounding provides significantly higher accuracy of calculations in comparison 
with other modifications of rational arithmetics that require comparable time. 

2. Continued fractions and the approximate rational arithmetic. 
Recall some basic notions of the theory of continued fractions. Denote by 
[do; Oi, a 2 , ...] a continued fraction of the form 



a + 



at + 



(1) 



Any non-negative rational number - has a unique canonical representation in 
the form of a finite continued fraction 

P 



Q 



[ao; a%, a,2, a n ], 



(2) 



where all (i = 0, n) are non-negative integers, a$ > 1 if z = 1, . . . , n — 1 
and a n > 2 if n > 1. Irrational numbers can also be represented in the form (1) 
as infinite continued fractions. A convergent o/ i/ie order k of a continued 
fraction is defined for the decomposition (2) by the equality 



Qk 



[ao; ai, 02, Qfc], 



(3) 



where k < n. It is clear from (3) that the convergent of a continued fraction 
^ coincides with |. From the theory of continued fractions |J it is well known 
that the convergent of a continued fraction (3) is a best approximant to the 
number (2) in the following sense: for any fraction - such that < s < and 



7^ the following inequality holds: 



r p 


> 


Vk 


_ P 


s q 


qk 


q 



The only (trivial) counterexample is - = a Q + ~; in this case ao and ao + 1 
approximate the number p/q equally well. For any convergent ^ in the case 
of k 7^ and k < n, the following inequality holds: 

1 V Vh . 1 



qk{qk + qk+i] 



< 



Pk 

Qk 



< 



qkqk+i 



(4) 
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A similar property is correct for infinite continued fractions of the form (1) 
corresponding to irrational numbers. Using the inequality (4) we obtain an 
efficient algorithm for estimation of the rounding error. These properties of 
convergents make continued fractions an ideal tool for construction of approx- 
imate rational arithmetic. 

Recall the following algorithm for constructing of a convergents of a contin- 
ued fraction ||. Let the initial fraction be where p, q are integer numbers, 
p > 0, q > 1 (if p/q < 0, then we work with the fraction \p/q\ and then 
multiply result by —1). 

• Let the initial condition be defined by 6_ 2 = p, 6_i — q, p~2 = ®, P-i = 1, 
g_ 2 = 1, q-i = 0. 

• For i = 1,2,..., the values of ai and hi are consecutively computed as 
the quotient and the remainder obtained when 6j_ 2 is divided by b^i 
respectively: 

h-2 = aA-i + h- 

• The numerator and the denominator of the convergent of order i of the 
continued fraction are given in the recurrent form: 

Pi = CLiPi^x +Pi- 2l 

qi = atq^x + q^ 2 - 

• If hi — 0, then the convergent of the continued fraction coincides with 
the initial fraction | and the procedure terminates. 

• At each step (at each i=0,l,..), a criterion of accuracy (see below) is 
checked, and if the result satisfies the criterion of accuracy then the 
procedure terminates, otherwise we perform the next step with i := i + 1. 

As a criterion of accuracy we can choose one of the following conditions: 

1) the absolute error is less than A; 

2) the relative error is less than 5; 

3) both conditions 1) and 2) are satisfied. 

Note that inequality (4) makes possible to check the absolute error without 
a direct comparison with the initial number. This algorithm of rounding can be 
applied to a result of any arithmetic operation if lengths of its numerator and 
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denominator exceed a given threshold M. Of course, the values of parameters 
A, 5, M are given by the user. 

3. Estimations of round-off errors. As a consequence of the recurrence 
formula for the denominator the values q k are minimal for each fixed k, if a» = 1 
for i = 0, 1, k, i.e. q t = + q^, 9-2 = 1, 5-i = 1, go = 5-i + Q-2 = 1- 
Therefore for any convergent — we obtain the inequality 

Qk 

Qk > F k+1 , (5) 

where i*£ are Fibonacci numbers defined by the recurrence formulas F k = 
+ F fc _ 2 (k > 2), where F = 0, Fi = 1. It is well-known that Fibonacci 
numbers are expressed by the formula: 

F k = -^=(<S> k -$ k ), (6) 

where $ = \(l + y/E) « 1.618 ("golden section"), $ = |(1 - >/5) « -0.618. 
It is also well-known that the convergence of the continued fraction expansion 
of $ is the slowest among other numbers. From (5) and (6) it follows that 

q k q k+ i > F k+l F k+2 = W+ 1 - <f> fe+1 )($ fc+2 - $ fe + 2 ) = 

o 

= I($ 2fc +! _ ($<f>) fe+1 ($ + $) + (f) 2 ^ 1 ) = I($ 2fe + 3 + (_l) fc + $ 2fc + 3 ) ? 

since $<& = — 1 and $ + $ = 1. Using the values of $ and $, we obtain for 
arbitrary k the estimations q k q k +i > |$ 2A:+2 , while for even k: q k q k+ i > |$ 2fc+3 . 
From these estimates and inequalities (4) it follows that the absolute round-off 
error is less than A if ^ 

k>- log $ — - 1 



for even k and 



k > - log* 

~ 2 A 2 



for odd fc. 

4. Estimations of the number of iterations. These relations lead to 
upper estimations for the number of iterations required for the approximation 
of a rational fraction with an absolute error smaller than A. Let — be the 
convergent of a continued fraction within the required error and the convergent 
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^^j- does not give the required accuracy yet. Thus k is the number of iterations 
necessary to obtain the required accuracy. 

For any real number r denote by [r\ the floor of r and by |~r] the ceiling 
of r. So |>] is the least integer that is greater or equal to r; similarly, is 
the integer part of r, i.e. the largest integer that is less or equal to r. Hence if 
r is an integer number, then \r\ = \r] and otherwise |Y] = |_?"J + 1 = |_ r + lj • 
Then the required upper estimate has the following form: 

fe<^log,|-ll<L^log,|j. (7) 

But if the number \\ log$ ^ — |] is even, the estimation (7) can be strength- 
ened: 

r 1 _ 5 3-, | 1 , 5 1 i 

k<\- 2 \o g ^--}<[-\o g «---\ (8) 

In the case of A = 10 _Ar the estimations (7) and (8) give the following result: 
Theorem. If an absolute error specified in the criterion of accuracy of 
rounding has the form A = 10 _JV , then 

k < [a + bN\ , (9) 

where a = ^log^S ~ 1.672 and b = | log.j, 10 ~ 2.392. If the number 
\a + bN — |] is even, the estimation (9) can be strengthened: 

k<[a-- + bN\. (10) 

For example, if iV = 8, then the estimation (9) shows that k < 20; for 
N = 9 this estimation gives k < 23, but in this case the estimation (10) is 
applicable, so k < 22. 

Note that these estimations depend only on the absolute error A and do not 
depend on the initial (i.e. rounded) numbers. In fact (see below), the number 
of iterations is usually much less than right-hand sides of these inequalities. 
Since the number of iterations is estimated by a linear function of logarithm 
of absolute error, an increase in the accuracy of calculations does not lead to 
an explosive growth of calculation time. 

Heuristically, it is easy to estimate the mean value k of parameter k for 
a fixed absolute error A. Consider a convergent of a continued fraction — 
as an approximation to a real number x. A. Ya. Khinchin investigated the 
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convergents of continued fraction ^ for real numbers and proved that for 
almost all x the estimation lim^oo Jfq~k = 7 is valid, where 7 is a constant 
(see 0). P. Levy (@, p. 320), showed that ln 7 = « 1.18657..., i.e. 

7 « 3.27582. . . Roughly speaking, this result means that if values of k are 
sufficiently large, then the denominator of a continued fraction is "close" to 
7*\ 

Leaving the mathematical rigor aside for a moment, substitute the quanti- 
ties 7 fc and 7 fe+1 into (4) for of and qk+i in order to estimate a mean order 
of the convergent with a given approximation error A. As an upper bound 
we obtain the number — |, and the lower bound differs from the upper 

bound by the value ln ^ 1 ^ 7 ' 1 rs 0.11. Thus, the mean value of k (not necessarily 

integer) is close to 2ln • If A = 10 _Ar , then 

- ln(l/A) iVlnlO 

k ~ = — « 0, 97 ■ N ~ iV. 11 

2 In 7 2 In 7 

This estimation becomes realistic only for large values of N, otherwise k is 
much less than N. 

5. Examples of applications of different variants of approximate 
rational arithmetic. To compare different variants of rational arithmetics 
consider a classical example of a numerical calculation of the function sin x at 
points x m = f + 27rm by summation of its Taylor series. The sum is calculated 
until the absolute value of a summand becomes less than 10~ 7 . The number ir 
is replaced by its rational approximation ||| with an absolute error 2.7 ■ 10~ 7 . 

A 500 MHz Intel Pentium III processor was used for calculations. Differ- 
ent variants of rational arithmetics were created using the arbitrary precision 
arithmetic, implemented by means of the C ++ programming language (imple- 
mentation by means of the REDUCE system gives about the similar results). 

We consider the following variants of approximate rational arithmetic: 

I) The arbitrary precision arithmetic (without rounding). 

II) Approximate rational arithmetic described in the section 2 with M — 9, 
A = 10~ 8 , 5 = 00 (so only absolute round-off error A = 10~ 8 is fixed). 

III) The same arithmetic with M = 9, A = 5 = 10~ 8 . 

IV) The same arithmetic with M = 9,A = oo,<5 = 10~ 8 (so only relative 
round-off error is fixed). 

V) Fixed slash arithmetic 0, where the maximum length L of numerator 
and denominator is fixed by L = 6. 
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m 





1 


2 


3 


4 


5 


6 


I 


£ 


4 • 10" 8 


4 • lO" 7 


8 • lO" 7 


10 _ti 


2 • 10"° 


3 • 10" e 


5 • 10" e 




s 


62 


214 


372 


504 


650 


810 


980 




t 


0.007 


0.09 


0.24 


0.95 


3.3 


5.7 


17 


II 


£ 


2 • 10" 8 


5 • 10" 7 


10' 6 


10" 6 


2 • 10" 6 


2 • 10" 6 


3 • 10" 6 


A = 1(T 8 


s 


16 


13 


12 


12 


12 


12 


11 




t 


0.007 


0.08 


0.15 


0.21 


0.28 


0.34 


0.42 


III 


£ 


4 • 10" 8 


5 • 10~ 7 


10~ 6 


10" 6 


2 • 10" 6 


2 • 10" 6 


3 • 10" 6 


A = 1(T 8 


s 


15 


13 


12 


12 


12 


12 


11 


5 = 1(T 8 


t 


0.012 


0.09 


0.16 


0.23 


0.32 


0.37 


0.46 


IV 


£ 


4 • 10" 8 


3 • 10" 7 


3 • 10" 4 


0.21 


0.6 


0.8 


1.17 


5 = 1(T 8 


s 


15 


13 


9 


9 


9 


8 


8 




t 


0.014 


0.06 


0.14 


0.18 


0.25 


0.29 


0.34 


V 


£ 


o 


o 


10~ 3 


0.7 


1.0 


1.4 


3.4 


L=6 


s 


2 


2 


12 


12 


12 


11 


12 




t 


0.017 


0.12 


0.18 


0.19 


0.21 


0.23 


0.28 


VI 


£ 


4 • 10" 8 


5 • 10" 7 


10' 6 


0.008 


0.08 


0.3 


0.6 


L=9 


s 


17 


18 


17 


18 


18 


16 


18 




t 


0.025 


0.21 


0.29 


0.31 


0.33 


0.36 


0.39 


VII 


£ 


4 • 10" 8 


5 • 10" 7 


10' 6 


10" 6 


10" 6 


2 • 10" 4 


0.007 


L=12 


s 


24 


24 


23 


23 


24 


23 


24 




t 


0.05 


0.29 


0.49 


0.56 


0.64 


0.65 


0.67 


VIII 


£ 


4 • 10" 8 


5 • 10~ 7 


2 • 10" 6 


10 -4 


0.04 


0.06 


0.8 


S=12 


s 


11 


11 


10 


11 


11 


10 


11 




t 


0.013 


0.18 


0.34 


0.48 


0.51 


0.52 


0.54 


IX 


e 


4 • 10~ 8 


5 • 10- 7 


10~ 6 


6 • 10~ 6 


2 • 10~ 3 


0.01 


0.4 


S=15 


s 


14 


14 


13 


13 


14 


13 


13 




t 


0.05 


0.21 


0.37 


0.41 


0.56 


0.61 


0.64 


X 


e 


4 • 10~ 8 


5- 10- 7 


io- 6 


2- IO" 6 


3 • IO" 6 


3- 10~ 5 


0.01 


S=18 


s 


17 


17 


15 


17 


16 


17 


17 




t 


0.023 


0.31 


0.51 


0.68 


0.75 


0.85 


0.87 


XI 


£ 





4- 10~ 5 


0.04 


0.1 


0.2 


0.4 


0.9 


D=9 


S 


18 


18 


18 


18 


18 


18 


18 




t 


0.008 


0.04 


0.12 


0.17 


0.26 


0.33 


0.42 



Table 1. 

Comparison of rational arithmetics 
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VI) The same arithmetic with L = 9. 

VII) The same arithmetic with L = 12. 

VIII) Floating-slash arithmetic JJ, where the maximum sum S of lengths 
of numerator and denominator is fixed by S = 12. 

IX) The same arithmetic with S = 15. 

X) The same arithmetic with S = 18. 

XI) A "reductive" arithmetic, where the numbers of true top digits D of 
numerator and denominator are fixed by D = 9 (the other digits are replaced 
by zeros). 

The results of numerical computations are presented in Table 1. For calcu- 
lation of the function sin(| + 2nm) at points m = 0, 1, 2, 3, 4, 5, 6 an absolute 
errors of the result e (the relative error equals 2e), time of calculations t in 
seconds and the sum of the lengths of numerator and denominator s for the 
result are specified in Table 1. All values of errors are rounded to the first 
digits to fit them in the format of the table. 

It follows from Table 1 that in the case of infinite precision arithmetic the 
quick increase of the parameter s leads to the explosive growth of calculations 
time. Curiously enough, the arbitrary precision arithmetic does not always 
give the most accurate result. This phenomenon can be partly explained by 
the inaccuracy of presentation of the number n, but it is also a consequence 
of the unusually simple form of the rational fraction sin(| + 27m) = |. The 
nature of this effect is discussed in [p] . If only the relative error of rounding is 
fixed (variant IV), the situation characteristic for the floating point arithmetic 
(variant XI) is repeated completely: starting at m = 4 the errors are larger 
then half of computed values (see. flQfl , part 3). 

For other types of rational arithmetics the increase of the difficulty of cal- 
culations damages the accuracy of the result essentially, in contrast to variants 
II and III of approximate rational arithmetic. On the other hand, the time of 
calculations is comparable in all cases except of the arbitrary precision arith- 
metic. 



N 


16 


18 


20 


22 


24 


26 


28 


30 


32 


34 


36 


k 


13.9 


16.4 


18.9 


20.9 


22.7 


24.7 


26.5 


28.4 


30.9 


33.0 


34.9 



Table 2. Dependence of the mean number 
of iterations on the accuracy of rounding 
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Table 2 presents the dependence of the mean number of iterations k (see 
above, Section 4) on the absolute rounding error A = 10"^ for N = 16, 18, 36 
The calculations were implemented for the model example described above 
with M — 9. It follows from Table 2 that an estimation (11) is realistic for k 
if N is sufficiently large. 

6. Conclusion. The approximate rational arithmetic described in the 
section 2 provides a sufficiently higher degree of accuracy of calculations in 
a time comparable to other types of rational arithmetics. This result is illus- 
trated by the above calculations quite clearly. It is particularly important that 
the round-off error can be controlled by the user on each step of the calculation 
procedure. This allows us to control the inaccuracy of rounding, estimate the 
maximum computing error beforehand, and guarantee (in particular, in terms 
of Interval Analysis) the required accuracy of calculations. 
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